C/C++ Interactive Reference Guide

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Interactive Reference Guide / C-C++ Interactive Reference Guide.iso / c_ref / csource3 / 173_01 / warning < prev next >

Wrap

Text File | 1979-12-31 | 6KB | 150 lines

< Warning > This warning is made based upon the article, in The C Users Group Newletter Sept/Oct, 1987, written by Victor Volkman and the letter from Michael Yokoyama. * Victor Volkman * Public domain (PD) lex program CUG has several syntax-level incompatibilities with the original Unix system specifications. First, in the definitions section Unix lex specifies: "Any line in this section not contained between %{ and %}, and beginning in column 1, is assumed to define Lex substitution strings. The format of such string is name translation and it causes the string given as a translation to be associated with the name. The name and translation must be separated by at least one blank or tab, and the name must begin with a letter." (Unix lex, Section 6) But in PD lex, the syntax of the name definition requires more delimiters than Unix lex: "A definition has the form: expression_name = regular_expression; where a name is composed of a lower-case letter followed by a sequence string of letter and digits, and where an underscore is a letter." (PD lex, Section 2.4) The additional syntax overhead and the lowercase restriction requires modifying existing Unix lex specification files for PD lex. For example, the Unix lex specification letter [a-z][A-Z] becomes letter = [a-z][A-Z]; Second, another syntax disparity occures in the rules section. The rules section is a table of regular expressions and their corresponding actions. The Unix lex specification makes no restrictions about the case sensitivity of the regular expression names. However, PD lex enforces an upper-case restriction on regular expressions similar to its lower-case restriction in the definition section: "Outside a string, a sequence of upper-case letters stands for sequence of the equivalent lower-case letters, while a sequence of lower-case letters is taken as the name of a LEX expression." (PD lex, section 2.1) This means that an existing Unix lex specification like while {return(WHILE);} must be changed to: WHILE {return(WHILE);} Unix lex users will immediately notice the absence of a string variable called yytext which contains the actual pattern match. The Unix lex specifically names this: "In more complex actions, the user will often want to know the actual text that matched some expressionn like [a-z]+. Lex leaves this text in an external character array names yytext". (Unix lex, section 4) However, PD lex does not provide this built-in yytext variable. Fortunately, a file called GETTOK.C provides a routine gettoken() which must be used to make yytext available. The following call will also properly set yyleng. yyleng = gettoken(yytext, sizeof, yytext); The PD lex processor does not support all of the regular expression operators which are supported by Unix lex. These operators are simply ignored in the PD lex documentation. When used in a lex file they do not produce errors but do not match regular expressions either. The missing operators as found in Unix lex, section 12: operator use semantics . any character but newline ^x an x at the beginning of a line <y>x an x when Lex is in start condition y x$ an x at the end of a line x? an optional x x+ 1,2,3 ... instances of x x{m,n} m through n occurrences of x Lastly, the IBM-PC adaptation is severely limited in its workspace for creating the lex tables. This small-model implementation means tha lex is limited to rules that it can process in a 64K data segment. A large-model implementation, while somewhat slower, would allow lex to access the full 640K memory. A future release of PD lex should be considered in order to remove this restriction. Depending upon the severity of your compiler's error checking mechanism, you may notice several warnings and errors for the sample lex and C files provided. For example. the Lattice C compiler insists that all externally declared items match their declarations. This means that a procedure declared void in lex.h must also be void in the C files. Another caveat concerns the use of variable-number argument function calls. Many C compilers will not generate code to correct for user-functions with variable-number arguments like they do for scanf() and printf(). Functions like lexerror() should be normalized to a fixed number of arguments (e.g. four arguments). Also, be aware that passing structures as function arguments is implementation dependent. Passing the address of a structure via pointer or address operator is highly recommended. Unless you are using the DeSmet C compiler (C88) then you should use the stdio.h file which came with your compiler. The DeSmet C stdio.h relies on the low-level implementation of file handles as an integer in MS-DOS. Compilers such as Lattice C use structures (instead of integers) like _iobuf for higher-level I/O functions like scanf() and putc(). Perhaps the biggest danger is the cavalier assumption that integers and pointers are equivalent. Kernighan and Ritchie thought this abuse important enough to discourage its use by devoting a section (5.6) of "The C programming Language" to it. Experienced MS-DOS programmers will note that pointers and integers are equivalent ONLY in the small model (less than 64K cod, less than 64K data). This is invalidated in large model compilers (more than 64K code more than 64K data) where pointers are 32-bits. Working with long 32-bit integers can help alleviate this problem. * Michael Yokoyama * PD Lex differs from UNIX Lex in that while in UNIX Lex the translation may be omitted, in PD Lex the translation is required. One technique that seems to work is to provide an entry of [\0 - \0377]; where the blank is used in UNIX Lex. Example. UNIX Lex PD Lex letter [a-zA-Z] letter = [a-zA-Z]; digit [0-9] digit = [0-9]; other other = [\0-\0377]; If you find any more difference between Unix Lex and PD Lex, please feel free to write us or call us. CUG.